Skip to content

dev branch to finally complete the basic unit handling logic#39

Draft
nevrome wants to merge 27 commits intomainfrom
moreArchchemProgress
Draft

dev branch to finally complete the basic unit handling logic#39
nevrome wants to merge 27 commits intomainfrom
moreArchchemProgress

Conversation

@nevrome
Copy link
Collaborator

@nevrome nevrome commented Feb 23, 2026

I created this as a development branch for @archaeothommy and me (and whoever would like to join) to work on the archchem class and its missing core features (see e.g. #12, #18, #20).

@nevrome nevrome marked this pull request as draft February 23, 2026 15:18
@nevrome
Copy link
Collaborator Author

nevrome commented Feb 25, 2026

What I implemented here so far:

  • removing the unit names from the column names in as_archchem, after they have been parsed and turned into unit types for the respective columns.
  • optionally recovering the unit names in the column names in remove_units. To write an archchem table one can now submit it to remove_units(recover_unit_names = TRUE) %>% readr::write_csv.
  • assigning units to error columns: either [%] or whatever unit the column they are refering to has. To implement that I split colnames_to_constructors into parse_colnames and build_constructors. The output of parse_colnames may serve as a useful intermediate data product for diagnostic purposes.
  • defining independent units for at% and wt% that get loaded with the package, using the install_unit mechanism of the units package. They are independent in the sense that they can not be converted into anything else automatically. We will have to provide conversion functions (as the one you have already implemented).

At this point I think it would be very important to test intensively. Overall this seems to be workable, but it's also a bit brittle. We need to know the edge cases, either to address them, or at least to work around them. So the three big ToDos I see right now are:

  1. Write tests validating the desired behaviour of archchem tables with the code we have right now.
  2. Integrate archchem with other package features e.g. atomic and oxide conversion.
  3. Write a vignette that shows the features and intended uses of archchem tables. Here we can highlight what we get with this setup: A predictable structure with column-wise meta-information, automatic and safe unit conversion, unit-aware ggplot scales etc.

@archaeothommy
Copy link
Owner

Thank you so much, @nevrome, this all looks awesome. I played a bit around with it and it seems to work fine. I lacked the headspace of writing proper tests, sorry.

I would expect that the ID column is created by renaming the respective column rather than making a copy of it. Is there a reason for this behaviour?

@archaeothommy
Copy link
Owner

Found a potential bug:

I tried to connect wt% to the udunits so that except at% and oxide% all units can be handled with the package units (see commented line in zzz.R. However, this resulted in the unit of relative errors being overwritten by the unit of the column with their values. For example, when running

test_file <- system.file("extdata", "test_data_input_good.csv", package = "ASTR")
arch <- read_archchem(test_file, id_column = "Sample", context = 1:7)

results in (focus on SiO2):

<truncated>
 $ Al2O3             : Units: [wtP] num [1:14] 3.92 3.46 5.87 4.33 4.17 3.58 5.6 4.47 3.73 6.27 ...
  ..- attr(*, "archchem_class")= chr "archchem_concentration"
 $ SiO2              : Units: [wtP] num [1:14] 31.6 29.1 30.5 25.7 43.5 ...
  ..- attr(*, "archchem_class")= chr "archchem_concentration"
 $ SiO2_errSD%       : Units: [wtP] num [1:14] 4.3 2.88 5.71 2.22 4.65 3.67 3 2.87 3.44 6.12 ...
  ..- attr(*, "archchem_class")= chr "archchem_error"
 $ P2O5              : Units: [wtP] num [1:14] 0.15 NA 0.11 0.23 0.4 0.29 0.43 0.62 0.24 0.28 ...
  ..- attr(*, "archchem_class")= chr "archchem_concentration"
<truncated>

The behavior occurs here, and deactivating this if-statement in the underlying constructor function disables it (because it disables assignment of units in general).

I was not able to figure out the reason. I suspected that purrr::map2() takes the first occurrence in the look-up table, which is for the values and not the error because both have the same base. However, this also true for the desired behavior.

You have to restart the R session and reload the package to reproduce the bug. It seems that only "load_all()" does not reset the units defined on package load.

@nevrome
Copy link
Collaborator Author

nevrome commented Mar 18, 2026

I looked at your changes with 2cbbc68...a69aebe - I didn't know that there would be so much special notation to be mapped, but good that you have a solid overview here.

I'll look into the bug you describe now. I have the suspicion that I ran into this as well three weeks ago. Looks familiar.

@nevrome
Copy link
Collaborator Author

nevrome commented Mar 18, 2026

Ok - so I can reproduce your observation and I still vaguely remember that I encountered the same issue. Afaik this has something to do with overwriting the unitless unit without a custom base. That's what you effectively do with (0.01 g) / g (as g/g = 1). In this case all unitless units are one and even % in SiO2_errSD% becomes wtP.

Following this issue here, I think there must be a solution. Looking at the help file of units::install_unit there seems to be a way to define a unitless unit with a custom base:

install_unit(symbol = character(0), def = character(0), name = character(0))

def: either

  • an empty definition, which defines a new base unit;
  • "unitless", which defines a new dimensionless unit;
  • a relationship with existing units (see details for the syntax).

So far I have left the definition of "mass_basis" empty. When I instead set it explicitly to unitless with safe_install("mass_basis", "unitless") then I seem to get the right behaviour.

<truncated>
 $ Al2O3             : Units: [wtP] num [1:14] 3.92 3.46 5.87 4.33 4.17 3.58 5.6 4.47 3.73 6.27 ...
  ..- attr(*, "archchem_class")= chr "archchem_concentration"
 $ SiO2              : Units: [wtP] num [1:14] 31.6 29.1 30.5 25.7 43.5 ...
  ..- attr(*, "archchem_class")= chr "archchem_concentration"
 $ SiO2_errSD%       : Units: [%] num [1:14] 4.3 2.88 5.71 2.22 4.65 3.67 3 2.87 3.44 6.12 ...
  ..- attr(*, "archchem_class")= chr "archchem_error"
 $ P2O5              : Units: [wtP] num [1:14] 0.15 NA 0.11 0.23 0.4 0.29 0.43 0.62 0.24 0.28 ...
  ..- attr(*, "archchem_class")= chr "archchem_concentration"
<truncated>

I'm not sure, though, if this covers what you have in mind. You write

I tried to connect wt% to the udunits so that except at% and oxide% all units can be handled with the package units

But we already handle all units with udunits, even at%. It just has this custom base which makes it sort of incompatible with other units. But that is unavoidable, right?

@nevrome
Copy link
Collaborator Author

nevrome commented Mar 18, 2026

I added my change in 3255fa2 with some more comments on this mechanism.

@archaeothommy
Copy link
Owner

But we already handle all units with udunits, even at%. It just has this custom base which makes it sort of incompatible with other units. But that is unavoidable, right?

This was imprecise phrasing from my side, sorry. I meant "udunits doing the conversion", i.e. without the need for dedicated conversion functions. And yes, they are unavoidable for these special cases.

Thanks! I will look into the details and your commit later today.

@archaeothommy
Copy link
Owner

I looked at your changes with 2cbbc68...a69aebe - I didn't know that there would be so much special notation to be mapped, but good that you have a solid overview here.

Welcome to the inconsistency of reporting... some are now outdated, some prefer writing with parentheses and others without. I tried to cover them all 🤷

@archaeothommy
Copy link
Owner

I added my change in 3255fa2 with some more comments on this mechanism.

Works as intended 😊

Thomas Rose added 2 commits March 18, 2026 21:22
This would be handy for columns created in functions to be then included in an archchem object.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants